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SYSTEM AND APPARATUS FOR SEQUENTIAL 



PROCESSING OF ANALYTES 



Field of the Invention 
The invention relates generally to systems and apparatus for carrying out large scale 
5 parallel reactions on solid phase supports, and more particularly, to systems and apparatus for 
monitoring and carrying out reactions on arrays of microparticles. 



10 the development of analytical techniques that employ parallelization and miniaturization of 

analyte processing, e.g. Graber et al. Current Opinion in Biotechnology, 9: 14-18 (1998); Fodor et 
al. Nature, 364: 555-556 (1993); Meier-Ewert et al. Nature, 361 : 375-376 (1993); Taylor et al. 
Nucleic Acids Research, 25: 3164-3168 (1997); Gamer etal,BioTechniques, 14: 112-115(1993); 
Lam et al. Nature, 354: 82-84 (1991); Ohlmeyer et al, Proc. Natl. Acad. Sci., 90: 10922-10926 

15 (1993); DeRisi et al, Science, 278: 680-686 (1997); Wodicka et al. Nature Biotechnology, 15: 
1359-1367 (1997); and the like. 

Many of these techniques employ microparticles for synthesizing analytes or for 
capturing analytes for subsequent analysis, e.g. Lam et al (cited above); Benkovic et al. 
International patent application PCT/US95/03355; Gavin et al. International patent application 

20 PCT/EP97/02039; Brenner et al. International patent application PCT/US96/09513, and the like. 
Even though the properties of different types of microparticles can vary widely, microparticles 
generally facilitate the construction and manipulation of large repertoires of analytes with 
minimal reagent and/or sample consumption. However, handling and manipulating large 
numbers of microparticles, e.g. tens to hundreds of thousands, for carrying out specific chemical 

25 and/or biochemical analyses gives rise to many difficulties, including whether sufficient signal is 
generated on individual microparticles for detection, how to track individual microparticles 
through multiple steps of a process, mechanical strength of microparticles under pressure or flow 
conditions, the ability to uniformly deliver reagents to microparticles for carrying out steps of an 
analytical process, whether clumping or other inappropriate interaction of microparticles and/or 

30 reagents occurs, the degree to which analytes and/or processing reagents adsorb onto vessel 

walls, whether protein reagents or analytes denature causing a disruption of reagent distribution 
and access, whether adjacent microparticles will interact, e.g. to degrade or obscure a signal or to 
inhibit reagent access, and the like. 



35 handling and processing multiple solid phase supports, such as populations of microparticles. It 



BACKGROUND 



The desire to understand and analyze complex chemical and biological systems has led to 



In view of these difficulties, it would be desirable to provide a system and apparatus for 
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would be especially desirable if such system and apparatus permitted the tracking and analysis of 
multiple analytes anchored to separate microparticles through a sequence of several processing 
and/or analysis steps. 



Accordingly, objects of our invention include, but are not limited to, providing a system 
and apparatus for sequentially delivering reagents to a population of analytes anchored to separate 
microparticles; providing an apparatus for simultaneously monitoring the interactions of 
processing reagents and analytes on the surfaces of microparticles disposed in a planar array; 

10 providing an apparatus for detecting optical signals generated by, or as the result of, interactions of 
processing reagents and analytes on the surfaces of microparticles disposed in a planar array; 
providing an apparatus for detecting pluralities of optical signals, each such plurality being 
generated at the surface of the same microparticle as a result of interactions between processing 
reagents and an analyte anchored to the surface of such microparticle; providing an apparatus for 

15 simultaneously tracking the positions of individual microparticles in a population of microparticles 
disposed in a flow chamber as a closely packed planar array; and providing a system and apparatus 
for simultaneously analyzing the nucleotide sequences of a population of polynucleotides 
anchored to microparticles disposed in a planar array in a flow chamber. 



20 chamber for disposing a population of microparticles in a planar array; fluidic means for 

sequentially delivering processing reagents from one or more reagent reservoirs to the flow 
chamber; and detection means for detecting a sequence of optical signals from each of the 
microparticles of the population. Preferably, the sequences of optical signals are generated as a 
result of a multi-step analytical process, such as nucleic acid sequence analysis. 

25 In one aspect, the invention provides a system for simultaneously monitoring a population 

of analytes which includes the apparatus of the invention, microparticles carrying the analytes, and 
software means for processing images of, and/or optical signals generated by, the microparticles 
when disposed in a planar array. Preferably, the flow chamber includes constraining means for 
restricting the movement of microparticles during cycles of reagent delivery. 

30 In another aspect, the invention includes a system for simultaneously analyzing the 

nucleotide sequences of a population of polynucleotides. Copies of each kind of polynucleotide in 
the population are sorted onto and anchored to one or more microparticles so that a population of 
loaded microparticles is formed. Loaded microparticles are disposed in a planar array in a flow 
chamber through which processing reagents are sequentially delivered to the loaded microparticles 

35 from one or more reagent reservoirs by a fluidic means. Optical signals generated by, or produced 
as a result of, the interaction of processing reagents and polynucleotides on the microparticles are 
imaged by a detection means. Preferably, when analysis includes determining the nucleotide 
sequence of a portion of each polynucleotide on the different microparticles, massively parallel 



Summary of the Invention 



Our invention achieves these and other objects with an apparatus comprising a flow 
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signature sequencing (MPSS) analysis is employed, e.g. as described in AJbrecht et al. 
International patent application PCT/US97/09472. 

Brief Description of the Drawings 
5 Figure la is a schematic representation of a flow chamber and fluidics and detection 

systems for observing a planar array of micropaiticles loaded with anaiyte molecules, such as 
cDNA molecules for sequencing. 

Figure lb is a schematic of a preferred holder for a flow chamber. 
Figure 2a is bilateral cut away view of a flow chamber. 
10 Figure 2b is a top view of a flow chamber. 

Figure 2c is an illustration of microparticles being loaded into a flow chamber. 
Figures 3a through 3d schematically illustrate microparticle constraining means for a flow 
chamber. 

Figure 4 is a schematic representation of a device for loading microparticles into a flow 
15 chamber. 

Figure 5 is a schematic representation of a fluidics system for use with the invention. 
Figures 6a and 6b schematically illustrate top-lighting and back-lighting approaches for . 
determining microparticle centers in an array. 

Figure 7 schematically illustrates the assignment of pixels to microparticles for data 
20 processing. 

Figure 8 is a flow chart summarizing operation of the system of the invention. 

Definitions 

"Complement" or "tag complement" as used herein in reference to oligonucleotide tags 
25 refers to an oligonucleotide to which a oligonucleotide tag specifically hybridizes to form a 

perfectly matched duplex or triplex. In embodiments where specific hybridization results in a 
triplex, the oligonucleotide tag may be selected to be either double stranded or single stranded. 
Thus, where triplexes are formed, the term "complement" is meant to encompass either a double 
stranded complement of a single stranded oligonucleotide tag or a single stranded complement of 
30 a double stranded oligonucleotide tag. 

The term "oligonucleotide" as used herein includes linear oligomers of natural or 
modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, anomeric 
forms thereof, peptide nucleic acids (PNAs), and the like, capable of specifically binding to a 
target polynucleotide by way of a regular pattern of monomer-to-monomer interactions, such as 
35 Watson-Crick type of base pairing, base stacking, Hoogsteen or reverse Hoogsteen types of base 
pairing, or the like. Usually monomers are linked by phosphodiester bonds or analogs thereof to 
form oligonucleotides ranging in size from a few monomeric units, e.g. 3-4, to several tens of 
monomeric units, e.g. 40-60. Whenever an oligonucleotide is represented by a sequence of 
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letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5*->3' order from 
left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes „ 
deoxyguanosine, and "T" denotes thymidine, unless otherwise noted. Usually oligonucleotides 
of the invention comprise the four natural nucleotides; however, they may also comprise non- 
5 natural nucleotide analogs. It is clear to those skilled in the art when oligonucleotides having 
natural or non-natural nucleotides may be employed, e.g. where processing by enzymes is called 
for, usually oligonucleotides consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or oligonucleotide 
strands making up the duplex form a double stranded structure with one other such that every 

10 nucleotide in each strand undergoes Watson-Crick basepairing with a nucleotide in the other 
strand. The term also comprehends the pairing of nucleoside analogs, such as deoxyinosine, 
nucleosides with 2-aminopurine bases, and the like, that may be employed. In reference to a 
triplex, the term means that the triplex consists of a perfectly matched duplex and a third strand 
in which every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a basepair 

15 of the perfectly matched duplex. Conversely, a "mismatch" in a duplex between a tag and an 

oligonucleotide means that a pair or triplet of nucleotides in the duplex or triplex fails to undergo 
Watson-Crick and/or Hoogsteen and/or reverse Hoogsteen bonding. 

As used herein, "nucleoside" includes the natural nucleosides, including 2*-deoxy and 2'- 
hydroxyl forms, e.g. as described in Komberg and Baker, DNA Replication, 2nd Ed. (Freeman, 

20 San Francisco, 1992). "Analogs" in reference to nucleosides includes synthetic nucleosides 
having modified base moieties and/or modified sugar moieties, e.g. described by Scheit, 
Nucleotide Analogs (John Wiley, New York, 1980); Uhlman and Peyman, Chemical Reviews, 
90: 543-584 (1990), or the like, with the only proviso that they are capable of specific 
hybridization. Such analogs include synthetic nucleosides designed to enhance binding 

25 properties, reduce complexity, increase specificity, and the like. 

As used herein "sequence determination" or "determining a nucleotide sequence" in 
reference to polynucleotides includes determination of partial as well as full sequence information 
of the polynucleotide. That is, the term includes sequence comparisons, fingerprinting, and like 
levels of information about a target polynucleotide, as well as the express identification and 

30 ordering of nucleosides, usually each nucleoside, in a target polynucleotide. The term also 

includes the determination of the identification, ordering, and locations of one, two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some embodiments 
sequence determination may be effected by identifying the ordering and locations of a single type 
of nucleotide, e.g. cytosines, within the target polynucleotide "CATCGC ..." so that its sequence is 

35 represented as a binary code, e.g. "100101 ... " for "C-(not C)-(not C)-C-(not C)-C ... " and the 
like. 

As used herein, the term "complexity" in reference to a population of polynucleotides 
means the number of different species of molecule present in the population. 
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DETAILED DESCRIPTION OF THE INVENTION 
The system and apparatus of the invention is particularly applicable to the analysis of 
molecules that can be anchored in populations of duplicate copies to particulate solid phase 
5 supports. That is, in accordance with the invention, each analyte of a population is present on at 
least one microparticle in a quantity sufficient for the type of analysis being performed. For 
example, if combinatorial ly synthesized peptides on the microparticies are screened against a 
soluble receptor protein for detecting those that form stable complexes, the number of peptides 
available for binding on the surface of the microparticies must be large enough to generate a 

10 detectable signal when a binding event occurs. Of course, many additional factors well known in 
the art will present additional design constraints, such as the nature of the system for generating 
optical signals, the concentration of receptors, pH, salt concentration, the density and accessibility 
of the peptides on the microparticle surface, the solvent system employed, and the like. Analyte 
populations particularly relevant for use with the present apparatus include combinatorial libraries 

15 synthesized on microparticle supports, e.g as disclosed in Lam et al, Chem. Rev., 97: 41 1-448 
(1997); or Dower et al, U.S. patent 5,708,153, and polynucleotide libraries sorted onto 
microparticle supports, e.g. as disclosed in Brenner (cited above). 

Figure la is a schematic representation of an embodiment of the invention for detecting 
fluorescent signals. Flow chamber (100) having inlet (102), outlet (104) and planar cavity (106) 

20 holds microparticies in a planar array from which optical signals (108) generated by analytes 
and/or reactants on microparticies can be collected and imaged. Flow chamber (100) is 
operationally associated with fluidic system (112) and detection system (114), so that delivery of 
fluids and collection of signals is under control of computer (1 16). Preferably, optical signals are 
collected by microscope (118) and are imaged onto a solid state imaging device, such as charge- 

25 coupled device (CCD) (120) which is capable of generating a digital image of the physical image 
of the microparticle array with sufficient resolution for individual microparticies to be 
distinguished. For fluorescent signals, detection system (114) usually includes appropriate 
bandpass filter (122) for optical signal (108), bandpass filter (124) for excitation beam (128) 
generated by light source (126), and other standard components. As illustrated, a conventional 

30 fluorescence microscope is preferred which is configured for epiillumination. There is a great deal 
of guidance in the art for selecting appropriate fluorescence microscopes, e.g Wang and Taylor, 
editors, Fluroescence Microscopy of Living Cells in Culture, Parts A and B, Methods in Cell 
Biology, Vols. 29 and 30 (Academic Press, New York, 1989). 

A key feature of the invention is flow chamber (100). Body (130) of flow chamber (100) 

35 preferably comprised inlet (102), outlet (104) and planar cavity (106) which are formed by 
standard micromachining techniques, e.g. Ekstrom et al. International patent application 
PCT/SE91/00327; Brown, U.S. patent 4,91 1,782; Harrison et al. Anal. Chem. 64: 1926-1932 
(1992); and the like. Transparent plate (132) is sealingly attached to body (130) to form an 




wo 98/53300 



• 

PCT/US98/11224 



operational flow chamber (100). Body (130) may be constructed from any of several different 
materials including glass, silicon, polyethylene, polyester, teflon, other plastics, and the like. 
Preferably, transparent plate (132) is glass or quartz; and, when body (130) and transparent plate 
(132) are glass or silicon, transparent plate (132) is preferably attached to body (130) by anodic 
5 bonding, e.g. Pomerantz, U.S. patent 3,397,279. Key functions of the flow chamber include i) 
holding a population of microparticles in a substantially immobilized planar array, or monolayer, 
during a sequence of processing steps, ii) ensuring that processing reagents can access each 
microparticle during each step of a process, and iii) minimizing processing reagent usage. The 
degree of immobilization required may vary among different embodiments. Generally, more 

10 movement of microparticles within a planar array increases the computational and measurement 
burden of tracking positions of microparticles by image processing software. Design trade-offs 
therefore exist between the use of image processing software and the use of physical and/or 
chemical means for constraining microparticle movement. Preferably, physical and/or chemical 
means are employed to constrain microparticle movement within the planar array of 

15 microparticles in flow chamber (100). Such means are referred to herein as "movement 

constraining means." Most preferably, physical, or mechanical, movement constraining means are 
employed. 

Preferably, microparticles are disposed in flow chamber (100) in a closely packed planar 
array. As used herein, "closely packed" in reference to a planar array means either that the number 

20 of microparticles per unit area of a planar array is at least eighty percent of the number of 

microparticles in a hexagonal array of equal area, or that the average distance between centers of 
adjacent microparticles is less than two microparticle diameters. As used herein, a "hexagonal" 
array of microparticles means a planar array of microparticles in which every microparticle in the 
array contacts at least six other adjacent microparticles, as shown in Figure 3a. 

25 Additions features of flow chamber (100) of a preferred embodiment are illustrated in 

Figures 2a through 2c. Figure 2a is a cross sectional view along a longitudinal plane that bisects 
flow chamber (100). The same view, in a more abstracted rendition, is shown in Figure 2c. In 
both Figures, inlet (102) fluidly communicates with planar cavity (106) and outlet (104). 
Microparticles (200) carrying analytes enter inlet (102) and are carried by a suspending buffer to 

30 planar cavity (106) where they become packed against dam (202) which prevents the 

microparticles from exiting the flow chamber through outlet (104). Structurally, dam (202) may 
be formed by a sudden reduction of the vertical dimension of planar cavity (106). Preferably, 
vertical dimension (204) of planar cavity (106) is selected so that microparticles (200) are 
constrained to a plane, i.e. a monolayer, when they pack against dam (202). More preferably, 

35 vertical dimension (204) is selected to be between about 120 to 150 percent of the diameter of the 
microparticles employed. For example, when microparticles are employed that have diameters of 
5 (im, vertical dimension (204) may be 7 |im. Magnetic microparticles may be constrained to a 
plane and constrained from movement by applying a magnetic field so that the microparticles are 
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attracted to the ceiling or to the floor of planar cavity ( 1 06). Width (206) of planar cavity ( 1 06) is 
not a critical dimension; however, for convenience and efficiency, width (206) maybe selected to 
correspond to the dimensions of the signal collection region of detection system (114). Such 
regions labeled 1 through k in Figure 2b are referred to herein as "tiles." That is, the region of 
5 planar cavity (106) occupied by microparticles may be divided into non-overlapping areas, 

referred to as "tiles," that cover the entire occupied region. Figure 2b, which is a top view of the 
flow chamber of Figure 2a, also shows inlet (102), planar cavity (106), dam (202), and outlet (104) 
that lie in sequence along axis (217) of flow chamber (100). 



10 either alone or in combination. Such means include loading microparticles with trace amounts of 
a chemically reactive species which may be activated and cross-linked; providing physical, or 
mechanical structures, such as ridges, within the flow chamber; providing magnetically responsive 
microparticles which may be immobilized by an external magnetic field; providing a second 
population of microparticles that are loaded into a flow chamber after the analyte-containing 

1 5 population, which forces the analyte-containing population against dam (202); and the like. 
Exemplary chemically reactive species for use with nucleic acid analytes are disclosed in 
Summerton et al, U.S. patent 4,123,610; Gamper et al, J. Mol. Biol., 197: 349-362 (1987); Hearst, 
Ann. Rev. Phys. Chem. 39: 291-315 (1988); Pieles et al. Nucleic Acids Research, 17: 8967-8978 
(1989); and the like. 

20 Preferably, microparticle movement is constrained by providing a flow chamber with 

planar cavity (106) containing a plurality of ridges running parallel to axis (217) of the flow 
chamber, i.e. parallel to the direction of reagent flow, so that microparticles are arranged into 
rows, which may be single-file, or several microparticles wide, as shown in Figures 3a and 3b. 
The particular selection may depend on several factors, including the degree of immobilization 

25 desired, constraints imposed by the fabrication technique used to construct the flow chamber, the 
amount of reagent access desired, the degree to which flow resistance or back-pressure can be 
tolerated, and the like. Figures 3a and 3b illustrate two possible distances between parallel ridges. 
In Figure 3a, the distance is selected to permit maximal packing of microparticles into a hexagonal 
array, and in Figure 3b, the distance is selected for less efficient packing, but for increased reagent 

30 access to microparticle surfaces. Figures 3c and 3d are axial views of the flow chamber showing 
the microparticle arrangements of Figures 3a and 3b, respectively. 

In some embodiments, such as those employing enzymatic processes, the inner surfaces of 
flow chamber (100) may be passivated, that is, treated to render such surfaces inert and/or non- 
adsorbing with respect to enzymes. The type of treatment depends on the sensitivity of the 

35 enzymes used in the process, and their affinity for the surfaces. Surface treatments include 

silanization, e.g. with commercially available reagents (Pierce, Rockford, IL); and/or adsorption of 
various blocking polymers, such as poly-a-alanine, polyglycine, polyadenylic acid, 
polymaieimide, polyvinylpyrrolidone, or the like, e.g. Shofifher et al, Nucleic Acids Research, 24: 



Many movement constraining means may be selected for use with the flow chamber. 
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375-379 (1996). Preferably, glass inner surfaces of flow chamber (100) are covalentiy coated with 
a neutral coating, such as ally] methacrylate, using the technique disclosed in Sandoval et al, U.S. 
patent 5,326,738, which is incorporated by reference. 

Figure lb illustrates flow chamber (100) mounted between holders (140) and (142) which 
5 sealingly connect inlet ( 1 02) to inlet tubing ( 1 44) and outlet ( 1 04) to outlet tubing ( 1 46), 

respectively. Preferably, holder (140) contains a rotary valve (not shown) operated by actuator 
(148) that shunts fluid flowing through inlet tubing (144) to inlet (102) or to waste line (150). 
Such a valve minimizes the amount of process reagent from a previous step that must be passed 
through flow chamber (100) prior to the initiation of the next process step. That is, such a rotary 

10 valve permits reagent in inlet tubing (144) to be shunted to waste and replaced by processing 
reagent required for the next step in the process being executed. Preferably, for use in DNA 
analysis, peltier block (152) is employed to control temperature in flow chamber (100) and the 
entire assembly including flow chamber (100) and peltier block (152) is mounted on xyz-stage 
(154) which is under control of computer (1 16). 

15 Preferably, micropanicles are loaded into flow chamber (100) prior to attachment of 

holders (140) and (142) and the initiation of processing steps. Figure 4 illustrates a microparticle 
loader for loading microparticles into flow chamber (100). Flow chamber (100) is mounted 
between holders (400), (402), (404), and (406). Holders (400) and (402) sealingly clamp onto the 
inlet end (101) of flow chamber (100) and holders (404) and (406) sealingly clamp onto the outlet 

20 end (103) of flow chamber (100) so that inlet tubing (408) is in fluid communication with outlet 
tubing (410) when the microparticle loader is assembled. Inlet tubing (408) is connected to 
syringe (416) which is used to drive fluid through flow chamber ( 1 00). Holder (400) is 
constructed to have conical passage (412) which narrows to match the diameter of inlet (102) of 
flow chamber (100). After assembly of holders (400), (402), (404), and (406) a suspension of 

25 microparticles is placed in the conical passage after which fitting (414) is sealingly connected to 
holder (400). Fluid pressure and flow generated by syringe (416) then drives the microparticles 
into planar cavity (106) and against dam (202). In a preferred embodiment which employs 5 \xm 
diameter GMA microparticles carrying DNA, approximately 500 thousand microparticles are 
loaded into flow chamber (100) by placing 5 of a 100 thousand microparticle/^L solution (TE 

30 buffer, pH 8.0, Sambrook et al. Molecular Cloning, Second Edition (Cold Spring Harbor 

Laboratory, New York, 1989)) in conical passage (412), attaching fitting (414), and using syringe 
(416) to drive the microparticles through inlet (102) and into planar cavity (106). After loading, 
holders (400), (402), (404), and (406) are removed from flow chamber (100), which is then 
mounted on the apparatus as shown in Figure 1 b. 

35 Preferably, process reagents are delivered to flow chamber (100) by the fluidic system 

illustrated in Figure 5 which has the capacity to handle many different reagents for complex 
analytical processes. In the illustrated embodiment, which is used in connection with DNA 
sequencing, the fluidics system may accommodate up to 38 reagents, including wash buffers. 




wo 98/53300 



PCTAJS98/11224 



rinses, enzymes, hybridization probes, adaptors, and the like. Preferably, the function of the 
fluidics system is the sequential metering of selected processing reagents to flow chamber (100). 
Inlet (102) of flow chamber (100) is sealingly connected to holder (140) which contains rotary 
valve (actuator shown as 148) (not shown in Figure 5). The function of the rotary valve is 
5 described above. A variety of means may be employed for moving processing reagents from 

reservoirs, through tubing, and into flow chamber (100), including gravity feed, pressure feed, and 
pumps, e.g. peristaltic, syringe, and the like. Preferably, common syringe pump (500) is employed 
for removing predetermined amounts processing reagents from reservoirs and for forcing such 
reagents through flow chamber (100) at a predetermined flow rate. Under control of computer 

10 (116), pump (500) in operational association with valve block (502) and rotary valve (504) 

removes a predetermined amount of processing reagent from a selected reservoir by siphoning 
reagent out of the reservoir on the out-stroke of plunger (501) of pump (500). On the in-stroke of 
plunger (501), rotary valve (504) directs processing reagent from tubing (503) to reservoir (505) of 
pump (500). On the out-stroke of plunger (501), state of rotary valve (504) is changed to direct 

15 processing reagent from reservoir (505) to inlet tubing (144). Tubing (503) connects rotary valve 
(504) with manifold (508) which, in turn, is connected to a plurality (five shown) of banks of zero 
dead volume valves (506). Zero dead volume valves (506) connect individual reservoirs holding 
processing reagents to a common passageway (not shown in Figure 5) that runs through each of 
the banks of valves connecting to manifold (508). 

20 A preferred zero dead volume valve is described in U.S. patents 4,558, 845 and 4,703,913, 

which are incorporated by reference. Process reagents from reservoirs (514) are distributed to the 
banks of dead volume valves by way of manifold (510). Alternative valve blocks for controlling 
delivery of process reagents to flow chamber (100) include the valve matrix disclosed in U.S. 
patent 5,203,368. 

25 An important feature of detection means (114) of the invention is the ability to keep track 

of individual microparticles through multiple process steps and/or cycles. In connection with such 
tracking, detection means (114) periodically records optical characteristics of individual 
microparticles that provide a close approximation microparticle centers. Preferably, when trans- 
illumination, or "back lighting" of flow chamber (100) is possible, the optical characteristic is the 

30 focused back light from the microparticles. That is, in reference to Figure 6a, back light (600) 
passes vertically through flow chamber (100) where it is focused by microparticles (602) onto 
focal plane (604). The image of focal plane (604) in this configuration appears as a field of bright 
points, where each point is located at the approximate center of its corresponding microparticle. 
In an epiillumination system, light from above flow chamber (100), i.e. "top light (610)," is 

35 directed from a vertical direction onto microparticles (602) where it scatters from the top surface 
of the microparticles. In this configuration, the optical characteristic is the scatter center of a 
microparticle. Thus, an image is collected from the plane containing scatter centers (612) 
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resulting from such top lighting. As with focused back lighting, the image of the scatter centers 
provides a convenient way to readily determine the approximate centers of the micfoparticles. 

In the preferred image processing approach, once microparticle centers (700) are 
determined, pixels (702) are assigned for determining characteristics, e.g. intensity, of an optical 
5 signal generated at each microparticle (602). The size of microparticle (602) and pixel area 
determine how many pixels are assigned to each microparticle. In making such an assignment, 
important factors include the degree to which the calculated center of a microparticle (as described 
above) is likely to deviate from the geometric center, the extent to which optical signal collected 
from the edge of an image contains spurious information (e.g. signal from an overlapping or 

1 0 adjacent microparticle), the uniformity of microparticle diameter and shape, and the like. In the 
preferred apparatus of the invention, 5 jam diameter microparticles are employed and the pixel 
dimensions of the CCD detector are about .9 fxm x .9 jam. Thus, nine pixels fit easily within the 
interior of a microparticle image with a margin of at least about 1 |im between any pixel and the 
edge of the microparticle image. In the preferred embodiment, an initial pixel is assigned which 

15 encloses the computed center of a microparticle, e.g. pixel "5" in Figure 7. Thereafter, additional 
pixels are assigned, usually the immediately adjacent pixels. Preferably, the value of the optical 
signal generated by a process at the surface of a microparticle is the average value of the optical 
signals collected by pixels assigned to that microparticle. 



20 flow chart of Figure 8. At the start (800) of an analysis, microparticles with anchored analytes 
have been loaded into flow chamber (100) which has been operationally mounted in holders 140 
and 142. The initial operation is the calibration of the microparticle focal plane (802). That is, the 
vertical, or "z", position of the xyz-stage is determined which optimizes the focus of either the 
scatter centers of the microparticles, i.e. the microparticle tops for top-lighting, or the focus points 

25 of the microparticles for back-lighting. The optimization is carried out by a conventional 
autofocusing algorithm which provides an image contrast function constructed from a 
predetermined sample of regions within a collected image. For example, the contrast function 
may be evaluated iteratively for sequence of z-positions so that the differences of successive 
values of the contrast function can be determined. These are tested until a difference is found 

30 below a predetermined threshold, which is taken as the maximum of the contrast function. Focal 
plane location is taken as the z position which maximizes the image contrast fiinction. Such 
calibration is carried out for each tile, if more than one tile is employed, so that a correction table 
is constructed of changes in stage setting values with respect to the settings of the first tile that are 
required to bring the system into focus upon translation to subsequent tiles. These values are 

3 5 stored by computer (116). 

After calibration, process steps are initiated (804) by way of a fluidics controller 
operationally associated with computer (116). After process steps (804) are completed, stage 
settings are adjusted to place the first tile into focus using the autofocus algorithm (806), which 



The general operation of the system of the preferred embodiment is summarized by the 
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places the focal plane of the microscope objective approximately at the tops of the microparticles. 
Stage settings are then adjusted (808) to bring the focal plane of the microscope objective to the 
approximate centers of the microparticles, as illustrated (606) in Figures 6a and 6b. The amount 
of stage movement in this re-focusing depends on the diameter of the microparticles being used. 
5 After appropriate selection of filters (124) and (122), a fluorescent image of the first tile is 

collected (8 1 0) and transferred to data server (812), Fluorescent images are collected on the plane 
of the microparticle centers because of imperfections in the planar array. That is, microparticles in 
planar cavity (106) do not lie in a perfect planar array for a variety of reasons. For example, some 
microparticles are elevated above others as a result of packing into the flow chamber; there is 

10 some variability in the size and shape of the microparticles; and, the floor of planar cavity (106) 
may be uneven. After the fluorescent image is collected, the focal plane of the microscope 
objective is returned (814) to the microparticle focal plane, where another image is collected (816) 
for the purpose of computing microparticle centers as described above. The image of 
microparticle centers is transferred to data server (812) where data processor (818) assigns pixels 

15 of the fluorescent image to each microparticle center, as described above. After the image of 

microparticle centers is collected (816), the stage is moved so that an image of the next tile can be 
collected (822). If there are no further tiles of microparticles (820), then the next steps and/or 
cycles of the process are executed (826). If there are no fiirther process steps (824), then the 
process is complete and the apparatus is placed in a holding mode. 

20 Optical signals collected in the course of analysis may be generated by a variety of 

mechanisms, including absorption and fluorescence, chemiluminescence, 
electrochemiluminescence, or bioiuminescence emission. Extensive guidance is available for 
selecting appropriate optical signaling means, e.g. Kessler, editor, Nonradioactive Labeling and 
Detection of Biomolecules (Springer- Verlag, Berlin); Keller and Manak, DNA Probes, Second 

25 Edition (Stockton Press, New York, 1993); and the like. Preferably, optical signals generated in 
processing steps are fluorescence emissions. 

Microparticles 

An important feature of the system of the invention is the use of microparticles for 
30 carrying analytes. A variety of microparticles may be employed depending on particular 

applications. Generally, microparticles must consist of a material compatible with the reagents 
and chemistry of the process steps being carried out and microparticle must be substantially 
mechanically rigid so that they retain their shape and size during process steps. Preferably, as 
used herein, the term "substantially mechanically rigid" means that microparticles neither swell 
35 nor contract by more than ten percent (as measure by diameter) in any process solvent or reagent. 
Preferably, microparticles are microspheres of uniform size, i.e. microparticles are monodisperse. 
More preferably, the diameters of spherical microparticles have a coefficient of variation less than 
five percent, and most preferably, less than two percent. Microparticle diameters are in the range 
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of from 0.1 }im to 100 \xm. Preferably, microparticle diameters range from 1 ^m to 20 jam. Most 
preferably, microparticle diameters are in the range of 1 to 5 jxm. Suitable microparticle materials 
include inorganic support materials such as glass, e.g, controlied-pore glass, Balltoni beads; silica, 
zirconia, and the like, e.g. Weetall, Methods in Enzymology, 44: 134-148 (1976); and organic 
5 support materials such as highly cross-linked polystyrene, polyacrylate, polymethylmethacrylate, 
glycidylmethacrylate (GMA), Dynabeads (Dynal, Oslo, Norway), and the like, Rembaum et al, 
U.S. patent 4,046,720; Hodge and Sherrington, editors, pages 435-456, Polymer-supported 
Reactions in Organic Synthesis (Wiley & Sons, New York, 1980); Andnis et al, U.S. patent 
5,047,524; and the like. 



In a preferred embodiment of the invention, identical copies of polynucleotides from a 
population are anchored to separate microparticles by solid phase cloning, i.e. the use of 

15 oligonucleotide tags for sorting polynucleotides onto microparticles such that only the same 
kind of polynucleotide will be attached to the same microparticle, e.g. Brenner, U.S. patent 
5,604,097, which is incorporated by reference. This condition is accomplished by taking a 
sample of the fiill ensemble of tag-polynucleotide conjugates. (It is acceptable that identical 
polynucleotides have different tags, as it merely results in the same polynucleotide being 

20 operated on or analyzed twice in two different locations.) Such sampling can be carried out 

either overtly— for example, by taking a small volume from a larger mixture—after the tags have 
been attached to the polynucleotides, it can be carried out inherently as a secondary effect of the 
techniques used to process the polynucleotides and tags, or sampling can be carried out both 
overtly and as an inherent part of processing steps. 

25 Oligonucleotide tags for use with the invention are members of a minimally cross- 

hybridizing set of oligonucleotides. The sequences of oligonucleotides of such a set differ from 
the sequences of every other member of the same set by at least two nucleotides. Thus, each 
member of such a set cannot form a duplex (or triplex) with the complement of any other 
member with less than two mismatches. Complements of oligonucleotide tags of the invention, 

30 referred to herein as "tag complements," may comprise natural nucleotides or non-natural 
nucleotide analogs. Tag complements are attached to microparticles. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements may be 
synthesized either combinatorially or individually depending on the size of the set desired and 
the degree to which cross-hybridization is sought to be minimized (or stated another way, the 

35 degree to which specificity is sought to be enhanced). For example, a minimally cross- 
hybridizing set may consist of a set of individually synthesized 10-mer sequences that differ 
from each other by at least 4 nucleotides, such set having a maximum size of 332 (when 
composed of 3 kinds of nucleotides and counted using a computer program such as disclosed in 
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Appendix Ic of International patent application PCT/US96/09513). Alternatively, a minimally 
cross-hybridizing set of oligonucleotide tags may also be assembled combinatorialiy from 
subunits which themselves are selected from a minimally cross-hybridizing set. For example, a 
set of minimally cross-hybridizing 12-mers differing from one another by at least three 
5 nucleotides may be synthesized by assembling 3 subunits selected from a set of minimally cross- 
hybridizing 4-mers that each differ from one another by three nucleotides. Such an embodiment 
gives a maximally sized set of 9"^, or 729, 12-mers, "9" is number of oligonucleotides generated 
by the computer program of Appendix la of International patent application PCT/US96/09513, 
which assumes, as with the 10-mers, that only 3 of the 4 different types of nucleotides are used. 

10 The set is described as "maximal" because the computer programs disclosed in International 
patent application PCT/US96/09513 provide the largest set for a given input (e.g. length, 
composition, difference in number of nucleotides between members). Additional minimally 
cross-hybridizing sets may be formed from subsets of such calculated sets. 

When synthesized combinatorialiy, an oligonucleotide tag of the invention preferably 

1 5 consists of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available depends on 
the number of subunits per tag and on the length of the subunits. 

As used herein in reference to oligonucleotide tags and tag complements, the term 

20 "repertoire" means the set of minimally cross-hybridizing set of oligonucleotides that make up 
the tags in a particular embodiment or the corresponding set of tag complements. 

Preferably, in constructing a cDNA library where substantially all different cDNAs have 
different tags, a tag repertoire is employed whose complexity, or number of distinct tags, greatly 
exceeds the total number of mRNAs extracted from a cell or tissue sample. Preferably, the 

25 complexity of the tag repertoire is at least 10 times that of the polynucleotide population; and 

more preferably, the complexity of the tag repertoire is at least 1 00 times that of the 

polynucleotide population. Below, a protocol is disclosed for cDNA library construction using a 

primer mixture that contains a full repertoire of exemplary 9-word tags. Such a mixture of tag- 

9 8 

containing primers has a complexity of 8 , or about 1 .34 x 10 . As indicated by Winslow et al, 

30 Nucleic Acids Research, 19: 3251-3253 (1991), mRNA for library construction can be extracted 

from as few as 10-100 mammalian cells. Since a single mammalian cell contains about 5 x 10^ 

4 

copies of mRNA molecules of about 3.4 x 10 different kinds, by standard techniques one can 
isolate the mRNA from about 100 cells, or (theoretically) about 5x10^ mRNA molecules. 
Comparing this number to the complexity of the primer mixture shows that without any 
35 additional steps, and even assuming that mRNAs are converted into cDNAs with perfect 

efficiency (1% efficiency or less is more accurate), the cDNA library construction protocol 
results in a population containing no more than 37% of the total number of different tags. That 
is, without any overt sampling step at all, the protocol inherently generates a sample that 
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comprises 37%, or less, of the tag repertoire. The probability of obtaining a double under these 
conditions is about 5%, which is within the preferred range. With mRNA from 10 cells, the 
fraction of the tag repertoire sampled is reduced to only 3.7%, even assuming that all the 
processing steps take place at 100% efficiency. In fact, the efficiencies of the processing steps 
5 for constructing cDNA libraries are very low, a "rule of thumb" being that good library should 
contain about 10^ cDNA clones from mRNA extracted from 10^ mammalian cells. 

Use of larger amounts of mRNA in the above protocol, or for larger amounts of 
polynucleotides in general, where the number of such molecules exceeds the complexity of the 
tag repertoire, a tag-poiynucleotide conjugate mixture potentially contains every possible pairing 

10 of tags and types of mRNA or polynucleotide. In such cases, overt sampling may be 

implemented by removing a sample volume after a serial dilution of the starting mixture of tag- 
polynucleotide conjugates. The amount of dilution required depends on the amount of starting 
material and the efficiencies of the processing steps, which are readily estimated. 

If mRNA were extracted from 10^ cells (which would correspond to about 0.5 fxg of 

15 poly(A)^ RNA), and if primers were present in about 10-100 fold concentration excess—as is 

called for in a typical protocol, e.g. Sambrook et al. Molecular Cloning, Second Edition, page 

8.61 [10 |iL 1.8 kb mRNA at 1 mg/mL equals about 1.68 x 10"^ ^ moles and 10 |aL 18-mer 

-9 

primer at 1 mg/mL equals about 1.68 x 10 moles], then the total number of tag-polynucleotide 
conjugates in a cDNA library would simply be equal to or less than the starting number of 

20 mRNAs, or about 5x10^^ vectors containing tag-polynucleotide conjugates—again this assumes 
that each step in cDNA construction— first strand synthesis, second strand synthesis, ligation into 
a vector— occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

If a sample of n tag-polynucleotide conjugates are randomly drawn from a reaction 

25 mixmre— as could be effected by taking a sample volume, the probability of drawing conjugates 
having the same tag is described by the Poisson distribution, P(r)=e" (k) /r, where r is the 
number of conjugates having the same tag and X.=np, where p is the probability of a given tag 
being selected. If n=10^ and p=l/(1.34 x 10^), then X= 00746 and P(2)=2.76 x 10'^, Thus, a 
sample of one million molecules gives rise to an expected number of doubles well within the 

30 preferred range. Such a sample is readily obtained as follows: Assume that the 5x10 

mRNAs are perfectly converted into 5 x 10^ ^ vectors with tag-cDNA conjugates as inserts and 
that the 5 X 10^ ^ vectors are in a reaction solution having a volume of 100 |al. Four 10-fold 
serial dilutions may be carried out by transferring 10 fal from the original solution into a vessel 
containing 90 |il of an appropriate buffer, such as TE. This process may be repeated for three 

35 additional dilutions to obtain a 100 |al solution containing 5x10^ vector molecules per (il. A 2 
]xl aliquot from this solution yields 10^ vectors containing tag-cDNA conjugates as inserts. This 
sample is then amplified by straight forward transformation of a competent host cell followed by 
culturing. 
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Of course, as mentioned above, no step in the above process proceeds with perfect 
efficiency. In particular, when vectors are employed to amplify a sample of tag-poiynucleotide 
conjugates, the step of transforming a host is very inefficient. Usually, no more than 1% of the 
vectors are taken up by the host and replicated. Thus, for such a method of amplification, even 
5 fewer dilutions would be required to obtain a sample of 10^ conjugates. 

A repertoire of oligonucleotide tags can be conjugated to a population of polynucleotides 
in a number of ways, including direct enzymatic ligation, amplification, e.g. via PCR, using 
primers containing the tag sequences, and the like. The initial ligating step produces a very large 
population of tag-polynucleotide conjugates such that a single tag is generally attached to many 

10 different polynucleotides. However, as noted above, by taking a sufficiently small sample of the 
conjugates, the probability of obtaining "doubles," i.e. the same tag on two different 
polynucleotides, can be made negligible. Generally, the larger the sample the greater the 
probability of obtaining a double. Thus, a design trade-off exists between selecting a large 
sample of tag- polynucleotide conjugates— which, for example, ensures adequate coverage of a 

1 5 target polynucleotide in a shotgun sequencing operation or adequate representation of a rapidly 
changing mRNA pool, and selecting a small sample which ensures that a minimal number of 
doubles will be present. In most embodiments, the presence of doubles merely adds an 
additional source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be ignored. 

20 As used herein, the term "substantially all" in reference to attaching tags to molecules, 

especially polynucleotides, is meant to reflect the statistical nature of the sampling procedure 
employed to obtain a population of tag-molecule conjugates essentially free of doubles. The 
meaning of substantially all in terms of actual percentages of tag-molecule conjugates depends 
on how the tags are being employed. Preferably, for nucleic acid sequencing, substantially all 

25 means that at least eighty percent of the polynucleotides have unique tags attached. More 

preferably, it means that at least ninety percent of the polynucleotides have unique tags attached. 
Still more preferably, it means that at least ninety-five percent of the polynucleotides have 
unique tags attached. And, most preferably, it means that at least ninety-nine percent of the 
polynucleotides have unique tags attached. 

30 Tags can be conjugated to cDNAs of existing libraries by standard cloning methods. 

cDNAs are excised from their existing vector, isolated, and then ligated into a vector containing 
a repertoire of tags. Preferably, the tag-containing vector is linearized by cleaving with two 
restriction enzymes so that the excised cDNAs can be ligated in a predetermined orientation. 
The concentration of the linearized tag-containing vector is in substantial excess over that of the 

35 cDNA inserts so that ligation provides an inherent sampling of tags. 

A general method for exposing the single stranded tag after amplification involves 
digesting a target polynucleotide-containing conjugate with the 5*—>3' exonuclease activity of T4 
DNA polymerase, or a like enzyme, e.g. as described in Kuijper et al. Gene, 1 12: 147-155 
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(1992). When used in the presence of a single deoxynucleoside triphosphate, such a polymerase 
will cleave nucleotides from 3* recessed ends present on the non-template strand of a double 
stranded fragment until a complement of the single deoxynucleoside triphosphate is reached on 
the template strand. When such a nucleotide is reached the 5'->3' digestion effectively ceases, as 
5 the polymerase's extension activity adds nucleotides at a higher rate than the excision activity 
removes nucleotides. Consequently, single stranded tags constructed with three nucleotides are 
readily prepared for loading onto solid phase supports. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by rendering 
them single stranded as described above, the polynucleotides are mixed with microparticles 

10 containing the complementary sequences of the tags under conditions that favor the formation of 
perfectly matched duplexes between the tags and their complements. There is extensive 
guidance in the literature for creating these conditions. Exemplary references providing such 
guidance include Wetmur, Critical Reviews in Biochemistry and Molecular Biology, 26: 227- 
259 (1991); Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring 

15 Harbor Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions are 
sufficiently stringent so that only perfectly matched sequences form stable duplexes. Under such 
conditions the polynucleotides specifically hybridized through their tags may be ligated to the 
complementary sequences attached to the microparticles. Finally, the microparticles are washed 
to remove polynucleotides with unligated and/or mismatched tags. 

20 Preferably, for sequencing applications, standard CPG beads of diameter in the range of 

20-50 |am are loaded with about 10^ polynucleotides, and glycidalmethacrylate (GMA) beads 
available from Bangs Laboratories (Carmel, IN) of diameter in the range of 5-10 ]xm are loaded 
with a few tens of thousand polynucleotide, e.g. 4 x 10^ to 6 x 10"^, to a hundred thousand 
polynucleotides. 

25 DNA Sequencing 

Polynucleotides loaded onto microparticles may be simultaneously sequenced in the 
instant apparatus using a "base-by-base" DNA sequencing methodology. Such sequencing 
methodology permits the stepwise identification of a sequence of nucleotides in a target 
polynucleotide, usually one base at a time, through successive cycles of treatment and detection. 

30 Base-by-base approaches are disclosed in the following references: Cheeseman, U.S. patent 
5,302,509; Tsien et al. International application WO 91/06678; Rosenthal et al. International 
application WO 93/21340; Canard et al. Gene, 148: 1-6 (1994); Metzker et al. Nucleic Acids 
Research, 22: 4259-4267 (1994); and the like. Preferably, the base-by-base approach disclosed 
by Brenner in U.S. patent 5,599,675 is used with the apparatus of the invention to sequence 

35 polynucleotides on a population of loaded microparticles disposed as a planar array in the flow 
chamber. Accordingly, Brenner, U.S. patent 5,599,675 is incorporated by reference. Preferably, 
the a population of loaded microparticles for sequencing includes at least ten thousand loaded 
microparticles; more preferably, such a population includes at least fifty thousand loaded 
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from the magnetic beads via a Bsm BI site carried by the biotinyiated primer. The Bsm BI-Dpn II 
fragment containing the tag-cDNA conjugate is then inserted into a plasmid and amplified. After 
isolation of the plasmids, tag-cDNA conjugates are amplified out of the plasmids by PGR in the 
presence of 5-Me-dCTP, using biotinyiated and fluorescently labeled primers containing pre- 
5 defined restriction endonuclease sites. After affinity purification with streptavidin coated 
magnetic beads, the tag-cDNA conjugates are cleaved from the beads, treated with T4 DNA 
polymerase in the presence of dGTP to render the tags single stranded, and then combined with a 
repertoire of GMA beads having tag complements attached. After stringent hybridization and 
ligation, the GMA beads are sorted via FACS to produce an enriched population of GMA beads 

10 loaded with cDNAs. The enriched population of loaded GMA beads are immobilized in a planar 
array in a flow chamber where base-by-base sequence takes place using encoded adaptors, as 
disclosed in Albrecht et al. International patent application PCT/US97/09472. 

Approximately 5 ^g of polyCA"*") mRNA is extracted from DBY746 yeast cells using 
conventional protocols. First and second strand cDNA synthesis is carried out by combining 100- 

15 1 50 pmoles of the following primer (SEQ ID NO: 1 ): 




5 ' -biot in-ACTAAT CGTCTC ACTAT TTAATTAA f W , W , W , G] gCC {T) iqV-3 ' 



with the poly(A+) mRNA using a Stratagene (La Jolla, CA) cDNA Synthesis Kit in accordance 
20 with the manufacturer's protocol. This results in cDNAs whose first stand deoxycytosines are 

methylated at the 5-carbon position. In the above formula, "V" is G, C, or A, "[W,W,W,G]" is a 
four-nucleotide word selected from Table II of Brenner, International patent application 
PCT/US96/09513, the single underlined portion is a Bsm BI recognition site, and the double 
underlined portion is a Pac I recognition site. After size fractionation (GIBCO-BRL cDNA Size 
25 Fractionation Kit) using conventional protocols, the cDNAs are digested with Dpn II (New 
England Bioscience, Beverly, MA) using manufacturer's protocol and affinity purified with 
streptavidin-coated magnetic beads (M-280 beads, Dynal A.S., Oslo, Norway). The DNA 
captured by the beads is digested with Bsm BI to release the tag-cDNA conjugates for cloning 
into a modified pBCSK" vector (Stratagene, La Jolla, CA) using standard protocols. The 
30 pBCSK" vector is modified by adding a Bbs I site by inserting the following fragment (SEQ ID 
NO; 2) into the Kpn I/Eco RV digested vector. 

CGAAGACCC 
3 ' -CATGGCTTCTGGGGATA-5 ' 

35 

Bsm BI/Dpn II digested tag-cDNA conjugate is inserted in the pBCSK' which is previously 
digested with Bbs I and Bam HI. After ligation, the vector is transfected into the manufacturer's 
recommended host for amplification. 
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microparticles; and still more preferably, such a population includes at least one hundred 
thousand loaded microparticles. 

Preferably, the sequencing method of Brenner (cited above) is employed in the 
embodiment disclosed in Albrecht et al International patent application PCTAJS97/09472 which 
5 discloses the use of encoded adaptors. An encoded adaptor is a double stranded oligonucleotide 
comprising a protruding strand and an oligonucleotide tag selected from a minimally cross- 
hybridizing set of oligonucleotides. Encoded adaptors whose protruding strands form perfectly 
matched duplexes with the complementary protruding strands of the target polynucleotide are 
ligated. After ligation, the identity and ordering of the nucleotides in the protruding strands are 

10 determined, or "decoded," by specifically hybridizing a labeled tag complement to its 

corresponding tag on the ligated adaptor. Encoded adaptors may be used in an adaptor-based 
method of DNA sequencing that includes repeated cycles of ligation, identification, and cleavage, 
such as the method described in Brenner (cited above). Briefly, such a method comprises the 
following steps: (a) ligating an encoded adaptor to an end of a polynucleotide, the encoded 

1 5 adaptor having a nuclease recognition site of a nuclease whose cleavage site is separate from its 
recognition site; (b) identifying one or more nucleotides at the end of the polynucleotide by the 
identity of the encoded adaptor ligated thereto; (c) cleaving the polynucleotide with a nuclease 
recognizing the nuclease recognition site of the encoded adaptor such that the polynucleotide is 
shortened by one or more nucleotides; and (d) repeating said steps (a) through (c) until said 

20 nucleotide sequence of the polynucleotide is determined. In the identification step, successive sets 
of tag complements are specifically hybridized to the respective tags carried by encoded adaptors 
ligated to the ends of the target polynucleotides, as described above. The type and sequence of 
nucleotides in the protruding strands of the polynucleotides are identified by the label carried by 
the specifically hybridized tag complement and the set from which the tag complement came. 

25 

Construction and Sorting of cDNA Library for 
Signature Sequencing with Encoded Adaptors 
In this example, a cDNA library is constructed in which an oligonucleotide tag consisting 
30 of 8 four-nucleotide "words" is attached to each cDNA. As described above, the repertoire of 
oligonucleotide tags of this size is sufficiently large (about 10^) so that if the cDNAs are 
synthesized from a population of about 10^ mRNAs, then there is a high probability that each 
cDNA will have a unique tag for sorting. After mRNA extraction, first strand synthesis is carried 
out in the presence of 5-Me-dCTP (to block certain cDNA restriction sites) and a biotinylated 
35 primer mixture containing the oligonucleotide tags. After conventional second strand synthesis, 
the tag-cDNA conjugates are cleaved with Dpn II (which is unaffected by the 5-Me- 
deoxycytosines), the biotinylated portions are separated from the reaction mixture using 
streptav id in-coated magnetic beads, and the tag-cDNA conjugates are recovered by cleaving them 
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After isolating the above pBCSK' vector from a standard piasmid miniprep, the tag- 
cDNA conjugates are amplified by PCR in the presence of 5-Me-dCTP using 20-mer primers 
complementary to vector sequences flanking the tag-cDNA insert. The "upstream" primer, i.e. 
adjacent to the tag, is biotinyiated and the "downstream" primer, i.e. adjacent to the cDNA, is 
5 labeled with fluorescein. After amplification, the PCR product is affinity purified then cleaved 
with Pac I to release fluorescently labeled tag-cDNA conjugates. The tags of the conjugates are 
rendered single stranded by treating them with T4 DNA polymerase in the presence of dGTP. 
After the reaction is quenched, the tag-cDNA conjugate is purified by phenol-chloroform 
extraction and combined with 5.5 mm GMA beads carrying tag complements, each tag 

10 complement having a 5' phosphate. Hybridization is conducted under stringent conditions in the 
presence of a thermal stable ligase so that only tags forming perfectly matched duplexes with 
their complements are ligated. The GMA beads are washed and the loaded beads are 
concentrated by FACS sorting, using the fluorescently labeled cDNAs to identify loaded GMA 
beads. The tag-cDNA conjugates attached to the GMA beads are digested with Dpn II to remove 

15 the fluorescent label and treated with alkaline phosphatase to prepare the cDNAs for sequencing. 
That is, phasphatase is used to remove the 5' phosphate from the ends of the cDNAs to prevent 
unwanted cDNA-cDNA ligations by way of the palindromic Dpn II site. 

The following cleavage adaptor (SEQ ID NO: 3) is ligated to the Dpn Il-digested and 
phosphatase treated cDNAs: 



After ligation, the 3* phosphate is removed by alkaline phosphatase, the 5' strand of the cDNA is 
25 treated with T4 DNA kinase, and the nick between the cleavage adaptor and cDNA is ligated. 
After cleavage by Bbv I, encoded adaptors are ligated to the ends of the cDNAs and the beads 
are ready for loading into the flow chamber. 

Ligation of the adaptors to the target polynucleotide is carried out in a mixture consisting 
of 5 jil beads (20 mg), 3 |iL NEB lOx ligase buffer, 5 ^L adaptor mix (25 nM), 2.5 - 
30 NEB T4 DNA ligase (2000 units/jiL), and 14.5 p.L distilled water. The mixture is incubated at 

16°C for 30 minutes, after which the beads are washed 3 times in TE (pH 8.0). 

After centrifugation and removal of TE, the 3* phosphates of the ligated adaptors are 
removed by treating the polynucleotide-bead mixture with calf intestinal alkaline phosphatase 
(CIP) (New England Biolabs, Beverly, MA), using the manufacturer's protocol. After removal of 
35 the 3' phosphates, the CIP may be inactivated by proteolytic digestion, e.g. using Pronase^^ 
(available form Boeringer Mannhiem, Indianapolis, IN), or an equivalent protease, with the 
manufacturer's protocol. The polynucleotide-bead mixture is then washed, treated with a mixture 
of T4 polynucleotide kinase and T4 DNA ligase (New England Biolabs, Beverly, MA) to add a 5' 
phosphate at the gap between the target polynucleotide and the adaptor, and to complete the 



20 




pGATCAGCTGCTGCAAATTT 
pTCGACGACGTTTAAA 



- 19- 
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ligation of the adaptors to the target polynucleotide. The bead-polynucleotide mixture is then 
washed in TE, diluted to a concentration of approximately 1 00 thousand beads per ]xL^ and 5 fiL of 
the resulting solution is loaded into a flow chamber with the help of the holders of Figure 4. 

The top strands of the following 16 sets of 64 encoded adaptors (SEQ ID NO: 4 through 
5 SEQ ID NO: 19) are each separately synthesized on an automated DNA synthesizer (model 392 
Applied Biosystems, Foster City) using standard methods. The bottom strand, which is the same 
for all adaptors, is synthesized separately then hybridized to the respective top strands: 

SEQ ID NO. Encoded Adaptor 

4 5 ' -pANNNTACAGCTGCATCCCttggcgctgagg 

pATGCACGCGTAGGG-5 ' 

5 5 ' -pNANNTACAGCTGCATCCCtgggcctgtaag 

pATGCACGCGTAGGG-5 ' 

6 5 ' -pCNNNTACAGCTGCATCCCttgacgggtctc 

pATGCACGCGTAGGG-5 * 

7 5 • -pNCNNTACAGCTGCATCCCtgcccgcacagt 

pATGCACGCGTAGGG-5 * 

8 5 • -pGNNNTACAGCTGCATCCCttcgcctcggac 

pATGCACGCGTAGGG-5 ' 

9 5 • -pNGNNTACAGCTGCATCCCtgatccgctagc 

pATGCACGCGTAGGG-5 * 

10 5 ' -pTNNNTACAGCTGCATCCCttccgaacccgc 

pATGCACGCGTAGGG-5 ' 



-20- 
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11 5 • -pNTNNTACAGCTGCATCCCtgagggggatag 

pATGCACGCGTAGGG- 5 ' 

12 5 • -pNNANTACAGCTGCATCCCttcccgctacac 

pATGCACGCGTAGGG- 5 ' 

13 5 • -pNNNATACAGCTGCATCCCtgactccccgag 

pATGCACGCGTAGGG- 5 ' 

14 5 ' -pNNCNTACAGCTGCATCCCtgtgttgcgcgg 

pATGCACGCGTAGGG- 5 * 

15 5 • -pNNNCTACAGCTGCATCCCt ctacagcagcg 

pATGCACGCGTAGGG- 5 ' 

16 5 ' -pNNGNTACAGCTGCATCCCtgtcgcgtcgtt 

pATGCACGCGTAGGG- 5 ' 

17 5 • -pNNNGTACAGCTGCATCCCtcggagcaacct 

pATGCACGCGTAGGG- 5 ' 

18 5 ' -pNNTNTACAGCTGCATCCCtggtgaccgtag 

pATGCACGCGTAGGG- 5 ' 

19 5 • -pNNNTTACAGCTGCATCCCt cccct gtcgga 

pATGCACGCGTAGGG- 5 ' 

where N and p are as defined above, and the nucleotides indicated in lower case letters are the 12- 
mer oligonucleotide tags. Each tag differs from every other by 6 nucleotides. Equal molar 
quantities of each adaptor are combined in NEB #2 restriction buffer (New England Biolabs, 
Beverly, MA) to form a mixture at a concentration of 1000 pmol/fiL. 
5 Each of the 16 tag complements are separately synthesized as amino-derivatized 

oligonucleotides and are each labeled with a fluorescein molecule (using an NHS-ester of 
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fluorescein, available from Molecular Probes, Eugene, OR) which is attached to the 5' end of the 
tag complement through a polyethylene glycol linker (Clonetech Laboratories, Palo Alto, CA). 
The sequences of the tag complements are simply the 12-mer complements of the tags listed 
above. 

5 A flow chamber of the design shown in Figures 2a and 2b is employed in association with 

an Olympus Optical Co., Ltd. (Tokyo, Japan) model BX60MF5 fluorescent microscope fitted with 
a model U-ULS75XE 75 watt Xenon arc lamp, a motorized filter wheel, a Ludl Electronic 
Products, Ltd. computer-controlled stage, and a Photometries, Ltd. (Tucson, AZ) PXL CCD 
camera with a 2000 x 2000 pixel array. Appropriate bandpass filters (122) and (124) are 

10 employed for exciting fluorescein and transmitting fluorescent signal to CCD camera (120). 
Microparticle positions are determined by top-lighting with broadband light from Xenon lamp 
(126) reduced by a factor of about 10*^ with a neutral density filter. Fluorescent images are 
collected with about 2 minute exposure times. 

Height (204) of flow chamber (201) is selected to be 7 fxm, or approximately 140% of the 

1 5 diameter of the GMA beads. Width (2 1 0) of flow chamber (20 1 ) is selected so as to ensure that a 
3x3 array of 9 image pixels will cover approximately 40-60% of a bead's image after lOx 
magnification (as illustrated in Figure 7). Thus, in order to capture images of tiles of about 100 
thousand 5 jam GMA beads, width (210) is selected to have a value of 1.7 mm. Length (212) is 
selected so that the flow chamber can hold from 1 to 10 tiles of about one hundred thousand 5 jam 

20 diameter beads each. The cross section (220) of inlet passage (214) matches that of the inlet 
tubing and gradually enlarges to match that of flow chamber (201) in the region of the planar 
cavity, i.e. the region holding the GMA beads on which analysis is performed. It is desirable to 
have a constant cross section through the planar cavity of flow chamber (20 1 ) to minimize the 
creation of non-uniform flow patterns, as might occur with sudden constrictions and/or expansions 

25 in cross section. Both body (218) and cover (216) of flow chamber (201) are glass, and the planar 
cavity and channels of body (218) are formed by standard chemical etching techniques. Cross 
section (222) of outlet passage (224) is selected to match the cross section of flow chamber (201) 
at dam (202). 

The fluidics system of Figure 5a which includes all valves, syringe pump (500), and 
30 Peltier block (152), is controlled by code written in Lab VIEW 5.0 (National Instruments, Austin, 
TX) and run on a Compact Deskpro Pentium-based microprocessor, which is connected to the 
various components of the fluidics system by standard I/O circuit boards. Detection system (1 14) 
and overall control of the instrument is effected through a Sun Microsystems (Mountain View, 
CA) Sparcstation 5. 
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Three cycles of ligation, identification, and cleavage are carried out in flow chamber (201) 
to give the sequences of 12 nucleotides at the termini of each of appoximately 500,000 cDNAs. 
That is, five tiles of GMA beads are analyzed in the following series of process steps: 



5 1 . Calibrate focal plane of GMA beads. 

2. Hybridize decoder. 

3. Autofocus on tile 1 . 

4. Set focus to bead centers. 

5. Collect fluorescent image. 

1 0 6. Set focus to bead focal plane (scatter centers). 

7. Collect image. 

8. Repeat steps 4-7 for remaining tiles. 

9. Wash. 

10. Repeat steps 2-9 for remaining decoders. 
15 11. Cleave encoded adaptor. 

12. Wash. 

13. Ligate top strand of next encoded adaptor. 

14. Wash. 

15. Repeat steps 13-14. 

20 16. Kinase bottom strand of encoded adaptor. 

17. Wash. 

18. Ligate bottom strand of encoded adaptor. 

19. Wash. 

20. Repeat steps 2-9. 

25 21. Repeat steps 1 1 - 1 9 for next encoded adaptor. 



In steps 2-9, nucleotides of the cDNAs are identified by hybridizing tag complements to the 
encoded adaptors. Specifically hybridized tag complements are detected by exciting their 
fluorescent labels with illumination beam (110) from Xenon arc lamp (126). In step 13, encoded 

30 adaptors and T4 DNA ligase (Promega, Madison, WI) at about 0.75 units per liL are passed 

through the flow chamber at a flow rate of about 1-2 |xL per minute for about 20-30 minutes at 
160C, after which wash of step 14 is executed by flowing, in succession, a solution of Pronase^M 
(Boehringer Mannheim, Indianapolis, IN), a salt wash solution, and an ethanol wash solution 
through the flow chamber, all with the same flow rate of 1-2 |j.L per minute and for durations of 

35 15, 10, and 10 minutes, respectively. The salt wash solution is 150 mM NaCl and 10 mM Tris- 
HCl (pH 8.5), and the ethanol wash solution is 3:1 (v/v) solution of the salt wash solution and 
ethanol. The ligation and wash steps 13 and 14 are repeated once, after which the adaptors and the 
cDNAs are prepared for second strand ligation by passing T4 DNA kinase (New England 
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Bioscience, Beverly, MA) at 7 units per |iL through the flow chamber at 31^C with a flow rate of 
1-2 per minute for 15-20 minutes. Ligation of the second strand is carried out by flowing T4 
DNA ligase (.75 units per mL, Promega) through the flow chamber for 20-30 minutes at a rate of 
1-2 per minute, followed by Pronase^^ treatment and washing as described above. Tag 
5 complements at 25 nM concentration are passed through the flow chamber at a flow rate of 1-2 
per minute for 10 minutes at 20^C, after which the fluorescent labels carried by the tag 
complements are illuminated and fluorescence is collected. The tag complements are melted from 
the encoded adaptors by passing NEB #2 restriction buffer with 3 mM MgCl2 through the flow 
chamber at a flow rate of 1-2 liL per minute at 55*^C for 10 minutes. Encoded adaptors are cleaved 
1 0 from the cDNAs by passing Bbv I (New England Biosciences, Beverly, MA) at 1 unit/^L at a flow 
rate of 1-2 per minute for 20 minutes at 37*^C, followed by Pronase^M treatment and washing, 
as described above. 



-24- 

05 



